Na Li and Matthew Stephens on Modeling Linkage Disequilibrium.
نویسنده
چکیده
Probabilistic models have played an indispensable role in population genetics for close to a century. They provide a powerful lens through which one can investigate how various evolutionary forces interact and produce the intricate patterns of genetic variation in a population. Furthermore, through statistical inference, one can estimate evolutionary parameters from population genetic data and draw important biological conclusions. There is a formidable challenge in this effort, however. Although the models (e.g., diffusion processes and coalescent theory) are relatively straightforward to describe, computing the relevant likelihoods for statistical inference is often intractable. This is especially true when recombination is taken into account. Indeed, a far-reaching consequence of recombination is that different loci can have different evolutionary histories, and this complication leads to an overwhelming explosion in the dimensionality of the model. In their influential work, Li and Stephens (2003) proposed a simple and elegant approach to circumvent this computational challenge, leading to a paradigm shift in modeling genetic relatedness with large-scale data. Their key methodological insight was to construct an approximate probabilistic model that captures the essential features of a genealogical process with recombination but produces a dramatic computational speed-up. Specifically, they considered the problem of approximating the conditional sampling probability (CSP) of the next haplotype given the haplotypes that have already been observed. A useful approximation had been proposed by Stephens and Donnelly (2000) for the simpler case of completely linked loci. They suggested approximating the next haplotype hk as an imperfect copy of one of the first k2 1 haplotypes, h1; . . . ; hk21, with copying errors corresponding to mutation. Fearnhead and Donnelly (2001) generalized this approach to incorporate recombination, assuming that haplotype hk is generated by copying segments from h1; . . . ; hk21, where recombination can change the haplotype from which copying is performed. The associated CSP can be computed efficiently using standard methods (dynamic programming applied to a hidden Markov model). Li and Stephens (2003) proposed a modification to the Fearnhead and Donnelly approximation to obtain a simpler generative model, thus providing a computational speed-up. More important, they showed how the CSPs could be combined to approximate the likelihood of the haplotype data under a model that allowed recombination rates to vary over short distances. This permitted effective inference of the fine structure of recombination rate variation from population genomic data. This clever approach opened up new avenues of statistical inference in population genetics. The modeling framework proposed by Li and Stephens has had a profound impact. Their “copying” model has been extended and applied to a wide range of problems, including inference of gene conversion parameters (Gay et al., 2007; Yin et al., 2009), recombination rates in admixed populations (Hinch et al., 2011; Wegmann et al., 2011), human colonization history (Hellenthal et al., 2008), fine population structure (Lawson et al., 2012), and local ancestry in admixed populations (Sundquist et al., 2008; Price et al., 2009). Themodel has Copyright © 2016 by the Genetics Society of America doi: 10.1534/genetics.116.191817 Address for correspondence: Department of Statistics, University of California, Berkeley 321 Evans Hall #3860, Berkeley, CA 94720-3860. E-mail: [email protected]
منابع مشابه
Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data.
We introduce a new statistical model for patterns of linkage disequilibrium (LD) among multiple SNPs in a population sample. The model overcomes limitations of existing approaches to understanding, summarizing, and interpreting LD by (i) relating patterns of LD directly to the underlying recombination process; (ii) considering all loci simultaneously, rather than pairwise; (iii) avoiding the as...
متن کاملModelling Linkage Disequilibrium, And Identifying Recombination Hotspots Using SNP Data
We introduce a new statistical model for patterns of Linkage Disequilibrium (LD) among multiple SNPs in a population sample. The model overcomes limitations of existing approaches to understanding, summarizing, and interpreting LD by (i) relating patterns of LD directly to the underlying recombination process; (ii) considering all loci simultaneously, rather than pairwise; (iii) avoiding the as...
متن کاملAn approximate likelihood for genetic data under a model with recombination and population splitting.
We describe a new approximate likelihood for population genetic data under a model in which a single ancestral population has split into two daughter populations. The approximate likelihood is based on the 'Product of Approximate Conditionals' likelihood and 'copying model' of Li and Stephens [Li, N., Stephens, M., 2003. Modeling linkage disequilibrium and identifying recombination hotspots usi...
متن کاملLinkage Disequilibrium-Based Quality Control for Large-Scale Genetic Studies
Quality control (QC) is a critical step in large-scale studies of genetic variation. While, on average, high-throughput single nucleotide polymorphism (SNP) genotyping assays are now very accurate, the errors that remain tend to cluster into a small percentage of "problem" SNPs, which exhibit unusually high error rates. Because most large-scale studies of genetic variation are searching for phe...
متن کاملAccounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.
Although many algorithms exist for estimating haplotypes from genotype data, none of them take full account of both the decay of linkage disequilibrium (LD) with distance and the order and spacing of genotyped markers. Here, we describe an algorithm that does take these factors into account, using a flexible model for the decay of LD with distance that can handle both "blocklike" and "nonblockl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genetics
دوره 203 3 شماره
صفحات -
تاریخ انتشار 2016